I have a dataset of about 4 million pairs of sentence-named entity.
Looks like this:
Sentence: MarketWatch has reached out to Charles Schwab and GQG for comment. Corresponding NER Tags: [{'end': 6, 'entity': 'B-ORG', 'index': 1, 'score': '0.98322886', 'start': 0, 'word': 'Market'} {'end': 7, 'entity': 'I-ORG', 'index': 2, 'score': '0.969261', 'start': 6, 'word': '##W'} {'end': 11, 'entity': 'I-ORG', 'index': 3, 'score': '0.97644824', 'start': 7, 'word': '##atch'} {'end': 38, 'entity': 'B-PER', 'index': 8, 'score': '0.9927636', 'start': 31, 'word': 'Charles'} {'end': 41, 'entity': 'I-PER', 'index': 9, 'score': '0.99394774', 'start': 39, 'word': 'Sc'} {'end': 44, 'entity': 'I-PER', 'index': 10, 'score': '0.41437265', 'start': 41, 'word': '##hwa'} {'end': 45, 'entity': 'I-PER', 'index': 11, 'score': '0.46933985', 'start': 44, 'word': '##b'} {'end': 51, 'entity': 'B-ORG', 'index': 13, 'score': '0.9984176', 'start': 50, 'word': 'G'} {'end': 52, 'entity': 'I-ORG', 'index': 14, 'score': '0.99367344', 'start': 51, 'word': '##Q'} {'end': 53, 'entity': 'I-ORG', 'index': 15, 'score': '0.99617106', 'start': 52, 'word': '##G'}]
What would be a good approach to verify the correctness of each item?
submitted by /u/shardblaster
[link] [comments]
( 9
min )